Zipf's Law and Random Texts
نویسندگان
چکیده
Random-text models have been proposed as an explanation for the power law relationship between word frequency and rank, the so-called Zipf’s law. They are generally regarded as null hypotheses rather than models in the strict sense. In this context, recent theories of language emergence and evolution assume this law as a priori information with no need of explanation. Here, random texts and real texts are compared through (a) the so-called lexical spectrum and (b) the distribution of words having the same length. It is shown that real texts fill the lexical spectrum much more efficiently and regardless of the word length, suggesting that the meaningfulness of Zipf’s law is high.
منابع مشابه
Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution
BACKGROUND Zipf's law states that the relationship between the frequency of a word in a text and its rank (the most frequent word has rank , the 2nd most frequent word has rank ,...) is approximately linear when plotted on a double logarithmic scale. It has been argued that the law is not a relevant or useful property of language because simple random texts - constructed by concatenating random...
متن کاملZipf's law and the structure and evolution of languages
By using a vast number of examples in social and economical data including natural languages, George Zipf was able to show an amazingly robust functional form of the rank-frequency plots 11, f 1=r f for frequency, r for rank, now commonly called Zipf's curve or Zipf's law. George Miller, a renowned linguist, summarized this study in 1965: Faced with this massive statistical regularity, you have...
متن کاملRandom texts exhibit Zipf's-law-like word frequency distribution
It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to it...
متن کاملLarge-Scale Analysis of Zipf’s Law in English Texts
Despite being a paradigm of quantitative linguistics, Zipf's law for words suffers from three main problems: its formulation is ambiguous, its validity has not been tested rigorously from a statistical point of view, and it has not been confronted to a representatively large number of texts. So, we can summarize the current support of Zipf's law in texts as anecdotic. We try to solve these issu...
متن کاملZipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts
Zipf's law is a fundamental paradigm in the statistics of written and spoken natural language as well as in other communication systems. We raise the question of the elementary units for which Zipf's law should hold in the most natural way, studying its validity for plain word forms and for the corresponding lemma forms. We analyze several long literary texts comprising four languages, with dif...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Advances in Complex Systems
دوره 5 شماره
صفحات -
تاریخ انتشار 2002